Skip to content

Conversation

@softcookiepp
Copy link

As the title says, the creation of vk::ShaderModule objects has been moved from kp::Algorithm kp::Module, a new class.
This will make it easier to implement caching of vk::ShaderModule objects, as well as add aditional features for shader module metadata handling in the future.

Copy link
Member

@axsaucedo axsaucedo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have done an initial review, seems like still a way to go, there's a few inconsistencies that need to be addressed, as well as caveats on existing approach and coding style. There also doesn't seem to be a fully complete implementation as this suggests an optionally owned resource but currently the PR does not introduce the functionaly to make it optionally owned. We also need to ensure new functionality is tested, especially in contexts where these resources are owned, and particularly assessing the memory management / lifecycle aspect.

@@ -0,0 +1,27 @@
#pragma once
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is quite random, what is this here for - remove

/*
* getter for mShaderModule
*/
vk::ShaderModule& getShaderModule() { return mShaderModule; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be const and should not return a mutable reference

public:

/*
* Constructor accepting a device and a SPIR-V binary
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment conventions should align to rest of codebase - see manager.hpp

// the vulkan device; not owned by this object
std::weak_ptr<vk::Device> mDevice;

// the shader module handle
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Superfluous comment, see other classes for examples

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • needs clarity on ownerhsip as this is sometimes owned resource

* Wrapper for Vulkan's shader modules.
* The purpose of this is to manage the module lifetime, while
* building the groundwork for easily integrating things like
* SPIR-V reflection and multiple entry points in the future.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewrite as description of class as of now not for future / groundwork

* building the groundwork for easily integrating things like
* SPIR-V reflection and multiple entry points in the future.
*/
class Module : public std::enable_shared_from_this<Module>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming convention here is not consistent, should be Shader as per file - module is a generic keyword

bool mFreeDescriptorSet = false;
std::shared_ptr<vk::ShaderModule> mShaderModule;
bool mFreeShaderModule = false;
std::shared_ptr<Module> mModule = nullptr;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is suggested as optionally owned resource but it's not handled as such, follow the same convention as per the rest of the optionally owned resources

src/Shader.cpp Outdated
spv.size());
vk::ShaderModuleCreateInfo shaderModuleInfo(vk::ShaderModuleCreateFlags(),
sizeof(uint32_t) * spv.size(), spv.data());
this->mDevice.lock()->createShaderModule(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we don't use device.lock throughout the codebase, let's ensure consitency

class Module : public std::enable_shared_from_this<Module>
{
// the vulkan device; not owned by this object
std::weak_ptr<vk::Device> mDevice;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Device is not a weakptr as we need to ensure that this resource doesnt outlive the lifecylce by storing as strong reference and/or ensuring this is handled correctly

src/Shader.cpp Outdated
{
KP_LOG_DEBUG("Kompute Module destructor started");
KP_LOG_DEBUG("Kompute Module Destroying shader module");
if (!mDevice.expired() )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensure consistency with other classes, including coding style

@softcookiepp
Copy link
Author

Alright, that should cover everything.

@softcookiepp softcookiepp requested a review from axsaucedo January 8, 2026 19:43
Copy link
Member

@axsaucedo axsaucedo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @softcookiepp - all tests are failing, let's make sure tests pass locally before re-submitting

@softcookiepp
Copy link
Author

Right now I cannot even get the tests to run on my system due to this issue: #445
I will likely have to fix this as well before merging; is that alright?

@axsaucedo
Copy link
Member

Oh right I see, you could also try running it with act https://github.com/nektos/act, as the CI runs with SwiftShader (https://github.com/KomputeProject/kompute/blob/master/docker-builders/Swiftshader.Dockerfile); or you can just set up SwiftShader, which allows you to run against CPU instead.

@softcookiepp
Copy link
Author

Thanks! Will do.

@softcookiepp
Copy link
Author

softcookiepp commented Jan 12, 2026

Alright, I got all the tests passing. I also changed the vk::Device queue family selection criteria from just vk::QueueFlagBits::eCompute to (vk::QueueFlagBits::eCompute | vk::QueueFlagBits::eTransfer). In practice most queue families supporting vk::QueueFlagBits::eCompute, will also support vk::QueueFlagBits::eTransfer, but there are always edge cases. It is never a good idea to assume the driver will take care of everything in Vulkan; the only reason the test with multiple queues passed at all is because you got lucky with the way your drivers handle it!

The queue family selection in general could use some reworking to prevent invalid API usage (see here: #445 (comment)) but I will leave that for another day.

Copy link
Member

@axsaucedo axsaucedo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Further changes requested - some strange changes that don't have anything to do with the PR, let's avoid this. Let's also make sure that this code is tested by evaluating an Algo rebuild as this would trigger the full cycle.

for (uint32_t i = 0; i < numParallel; i++) {
inputsAsyncB.push_back(mgr.tensor(data));
algosAsync.push_back(mgr.algorithm({ inputsAsyncB[i] }, spirv));
inputsAsyncB.push_back(mgrAsync.tensor(data));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you changing this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test was wrong and was causing my driver to crash. It was trying to use different kp::Manager (and therefore different underlying vk::Device) instances for allocating buffers, creating pipelines and executing them. You can't do this, even if the underlying hardware is the same. If you read the Vulkan validation errors while running the tests, you would see this:

[Jan  8 2026 11:08:47] [debug] [Manager.cpp:42] [VALIDATION]: Validation - vkCmdPipelineBarrier(): pBufferMemoryBarriers[0].buffer (VkBuffer 0x280000000028) was created, allocated or retrieved from VkDevice 0x5a2129296950, but command is using (or its dispatchable parameter is associated with) VkDevice 0x5a2129386e80

My change here fixes this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It still isn't completely fixed; see here for details: #445 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this is not "fixing" anything. This is a test that is currently set up on specific hardware, which is testing parallel execution. What you are doing is just not running the test. This is not "fixing" something.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It still crashes, but the validation error in my message no longer appears when I attempt to run the test. So it is a step in the right direction towards compliant API usage, but not a complete fix.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to see what this validation error may be, but your change basically makes this test redundant, so it's not correct. If you read through the test you are making it to not compare anything; the test is profiling a parallel queue with a non-parallel queue on a specific GPU that actually supports parallel processing (not async, parallel). Here's more info: https://medium.com/data-science/parallelizing-heavy-gpu-workloads-via-multi-queue-operations-50a38b15a1dc

Copy link
Author

@softcookiepp softcookiepp Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The simplified reasoning is that you are taking an underlying vk::PhysicalDevice and creating two different vk::Device instances with it, since each kp::Manager creates its own vk::Device. Though these may have the same physical GPU, the Vulkan driver may initialize separate resources every time a vk::Device is created for a given GPU.

What do these resources look like? It really depends on the driver. NVIDIA drivers (which given your comments, it seems you have) are more robust in this regard, while AMD drivers (which I am using right now) are quite a bit more unforgiving about things like this.

So when you try to use memory and pipelines allocated with mgr (which has its own vk::Device instance) with mgrAsync (which has a totally separate vk::Device), you are violating one of the core assumptions that Vulkan drivers make. You may as well be asking for it to mix and match resources created on different physical GPUs.

As for why the test is redundant now, I am really not sure; both mgr and mgrAsync are still used in this test. I just made sure mgrAsync did not accept tensors and algorithms created with mgr. If you could point to exactly which lines break the test and why, that would be very much appreciated.

Copy link
Member

@axsaucedo axsaucedo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few more comments but seems almost there. Are there any tests that we can add as well? Seems to me this segfault piece would benefit from one. We also need to make sure that the documentation is updated. Once these are resolved we should be able to merge. Thank you.

@axsaucedo
Copy link
Member

Thanks @softcookiepp - almost there, I just added a follow up reply to claify the tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants